from IPython.display import HTML
HTML(
"""<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>"""
)
import sys
import os
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("../")
%load_ext autoreload
%autoreload 2
# General
from pathlib import Path
import warnings
# Data & modeling
import numpy as np
import pandas as pd
pd.set_option("display.max_columns", 500)
import lightgbm as lgb
from scipy.stats import randint, uniform
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge
from sklearn.metrics import (
mean_absolute_error,
mean_squared_error,
f1_score,
confusion_matrix,
)
from sklearn.neural_network import MLPRegressor, MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MaxAbsScaler, MinMaxScaler
# Plotting
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objects as go
# Custom package
from storpdm import DATA_PATH
from storpdm.make_dataset import download_dataset, import_dataset
from storpdm.visualize import (
visualise_sensor_correlation_all_engine,
visualise_sensor_data_distribution,
plot_time_history_all_engines,
visualise_sensor_correlation_double,
interactive_rul_series,
actual_vs_pred,
display_roc_pr,
display_tp_fp,
plot_confusion_matrix,
plot_prob_RUL,
)
from storpdm.build_features import (
find_correlated_data,
list_correlated_data,
find_time_independent_columns,
add_calculated_rul,
prepare_training_data,
)
from storpdm.utils import metrics_threshold
from storpdm.train_model import EstimatorSelectionHelper
Turbofan engines are gas turbine engines often use in aircraft propulsion. In this document, we analyze data relative to such engines. The aim is to model and predict the Remaining Useful Life (RUL) as accurately as possible. In a real-life setup, such model can be used for:
The diagram below summarizes the structure of a turbofan engine:
The ambient air goes though the fan and is split into the inner and outer chamber. Air goes through the Low-pressure (LPC) and High-Pressure Compressors (HPC) before combustion occurs. (TO BE FILLED)
The data set consists of multiple multivariate time series. Each time series is from a different engine, i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.
The engine is operating normally at the start of each time series, and develops a fault at some point during the series: the fault grows in magnitude until system failure.
The general modeling outline is described in the diagram below and is as follows:
In this notebook we model the engine failures from two different points of view, trying to answer the following questions:
How many cycles are left until the engine fails?
In this case we deal with regression models
Will the engine fail in less than 10 cycles? What is the probability?
This case falls into the category of classification modeling
Let's download the data to a local folder and unzip it
## Run the download function only once
if len(list(Path("data/raw").glob("train_*.txt"))) == 4:
print("Raw data has been downloaded")
else:
download_dataset()
# Load datasets and display
df_rul, df_train, df_test = import_dataset(filename="FD001")
display(df_train)
Now that the data is loaded, we do the following pre-processing:
After the data cleanig we visualize the distribution of the remaining features just to make sure nothings stands outs (outliers and unexpected behaviors)
# Reduce / Eliminate highly-correlated sensors.
correlation_threshold = 0.99
correlated_data = find_correlated_data(df_train, correlation_threshold)
columns_to_be_removed = list_correlated_data(correlated_data)
print(f"Removing {columns_to_be_removed} from columns")
df_train_proc = df_train.drop(columns_to_be_removed, axis=1)
subplot_titles = ("Before removing high correlations", "After removing correlations")
fig = visualise_sensor_correlation_double(df_train, df_train_proc, subplot_titles)
fig.show()
On the left plot, we observe that some features are hihgly correlated. On the right plot we confirm they have been remove
Next, let's plot the distribution of the sensor data.
plt.figure(figsize=(15, 15))
with warnings.catch_warnings(record=True):
fig = visualise_sensor_data_distribution(df_train_proc)
It seems like some values are constant. We could have guessed this by looking at the raw table. We clean those constant columns too.
# Find columns that does not change with time.
time_independent_columns = find_time_independent_columns(df_train_proc)
print(f"Removing constant columns: {time_independent_columns}")
df_train_proc2 = df_train_proc.drop(time_independent_columns, axis=1)
# Add Remaining Useful Life (RUL) to dataset.
df_train_proc2 = add_calculated_rul(df_train_proc2)
display(df_train_proc2)
The table above is clean and reasy to be used. Let's visualize some of the time series separately. Onthe X axis, we show the Remaining Useful Life (RUL). As we approach RUL=0 it means the engine is about to fail.
On the graph below, select the variable to plot in the dropdown menu. Variable PhysCoreSpeed seems quite relevant, as its values diverge as the RUL decreases.
fig = interactive_rul_series(df_train_proc2, id_filter=[66, 12, 34])
fig.show()
We perform a comparison of 3 different models with several combinations of hyperparameters and automatically select the best one:
When selecting a model, we respect the principles of cross-validation to avoid overfitting.
# Define models
models = {
"linear": Ridge(normalize=True),
"lgb": lgb.LGBMRegressor(),
"mlp": Pipeline(
[("std", MinMaxScaler()), ("m1", MLPRegressor(early_stopping=True))]
),
}
# Define hyperparam search space
params = {
"linear": {},
"lgb": {
"n_estimators": [100, 500],
"learning_rate": [0.005, 0.01],
"max_depth": [5, 10, 15, 20],
},
"mlp": {
"m1__hidden_layer_sizes": [(128), (64, 32)],
"m1__learning_rate_init": [0.001, 0.005, 0.01],
},
}
# Split train and test
X_train, X_test, y_train, y_test = prepare_training_data(
df=df_train_proc2, target_col="RUL", discard_cols="id"
)
%%time
estim_grid = EstimatorSelectionHelper(models, params)
estim_grid.fit(
X_train, y_train, scoring="neg_root_mean_squared_error", n_jobs=2, cv=None
)
Below we show the top 5 models.
model = estim_grid.best_model
print(f"Winner model: {model}")
print(f"Total number of models: {estim_grid.score_summary().shape[0]}")
estim_grid.score_summary().head()
y_test_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_test_pred)
rmse = mean_squared_error(y_test, y_test_pred, squared=False)
print(f"Test Mean Absolute Error (MAE): {mae:.3f}")
print(f"Test Root Mean Squared Error (RMSE): {rmse:.3f}")
The "test" error is what we cna expect to see on new unseen data.
Let's draw the actual RUL versus the predictions. On the left scatter plot, ideally, they should be equal (i.e. represented by the diagonal line). Generally speaking, the points of clouds is proportional to the magnitude of the remaining useful life.
On the right plot we show the actual vs predicted RUL for a selecion of couple of engines.
actual_vs_pred(model, X_test, y_test, df_train_proc2)
Next, we analyze the trained model and infer how each of the sensors contribute to the decrease of RUL. Conclusions are stated at the bottoom.
# TODO: Shap feature importance ---> FIXME
import shap
data_shap = shap.kmeans(X_train, 50)
explainer = shap.KernelExplainer(model.predict, data=data_shap)
X_sample = X_train.sample(100)
with warnings.catch_warnings(record=True):
shap_values = explainer.shap_values(X_sample, l1_reg="num_features(100)")
# summarize the effects of all the features
shap.summary_plot(shap_values, X_sample)
From the SHAP plot above we conclude that:
StaticHPCOutletPres: higer values of pressure indicate lower RULSimilar conclusions are drawn from the Partial Dependence PLots (PDP) below.
from pdpbox import pdp, get_dataset, info_plots
features = [
"cycle",
"StaticHPCOutletPres",
"PhysCoreSpeed",
"LPCOutletTemp",
"FuelFlowRatio",
]
for f in features:
pdp_goals = pdp.pdp_isolate(
model=model, dataset=X_sample, model_features=X_train.columns, feature=f
)
pdp.pdp_plot(
pdp_goals, f, x_quantile=False, plot_pts_dist=True, plot_lines=True, center=True
)
Here we take the clean dataset but look at the same problem from a different perspective. Instead of modeling "number of cycles", we model the probability that the engine fails within 10 cycles. This alternative approach could be very useful in practice to generate real-time alerts.
The approach we follow is very similar as above, except that now we use classification models:
We train the 3 types of models with different combination of hyperparameters, respecting the principles of cross-validation to avoid overfitting. The best modle is chosen according to the f1-score.
cycle_threshold = 10
df_train_proc2["RUL_thres"] = np.where(df_train_proc2["RUL"] <= cycle_threshold, 1, 0)
# Define models
models = {
"lgb": lgb.LGBMClassifier(),
"logistic": LogisticRegression(max_iter=300),
"mlp": Pipeline(
[("std", MinMaxScaler()), ("m1", MLPClassifier(early_stopping=True))]
),
}
# Define hyperparam search space
params = {
"lgb": {
"n_estimators": [100, 500],
"learning_rate": [0.005, 0.01],
"max_depth": [5, 10, 15, 20],
},
"mlp": {
"m1__hidden_layer_sizes": [(128), (64, 32)],
"m1__learning_rate_init": [0.001, 0.005, 0.01],
},
"logistic": {},
}
X_train, X_test, y_train, y_test = prepare_training_data(
df=df_train_proc2, target_col="RUL_thres", discard_cols=["id", "RUL"]
)
%%time
estim_grid_clf = EstimatorSelectionHelper(models, params)
estim_grid_clf.fit(X_train, y_train, scoring="f1", n_jobs=2, cv=None)
Below we show the top 5 best models. Again, in this case, tree-based model outperform neural networks and logstic regression models, both in accuracy and in training times.
estim_grid_clf.score_summary().head()
model = estim_grid_clf.best_model
y_test_pred = model.predict(X_test)
f1 = f1_score(y_test, y_test_pred)
print(f"F1 score test: {f1:.3f}")
Next, we graph the predicted probability of failure given by the model versus the actual remaining useful life, for one of the engines. The model outputs reasonable probabilities on most occasions. Its confidence grows as the failure time approaches.
plot_prob_RUL(model, df_train_proc2, engine=96)
Classification models return a probabbility of failure. If we define a threshold, we can translate this probability to a binary variable: 1 indicates failure, 0 indicates no failure. Below, we select a few thresholds and compare how many failures are predicted correctly (true positives) versus how many false positives we observe. A false positives means we predict "failure" but in fact, it was not.
There is a certain trade-off between true and false positives. Depending on the cost of a false positive and the benefit from true positives we can establish a theshold to be used in practice.
# Predict for test set
y_score = model.predict_proba(X_test)
metrics = metrics_threshold(y_score[:, 1], y_test)
# Display true positives vs false positives
display_tp_fp(
thres=metrics["thres"],
tp=metrics["tp"],
fp=metrics["fp"],
title="",
fig=None,
i=0,
name1="Failures correctly predicted (true positives)",
name2="False positives",
)
It is interesting to look at the ratios too and not just as the absolute values. We define the following metrics which are universal for all classification problems. The reader is referred to (this article)[https://en.wikipedia.org/wiki/Precision_and_recall] for further reading:
Looking at the ROC-Precision graph below is another way of choose an optimal threshold based on the business needs. The table shows the same data as the graph.
display_roc_pr(
precision=metrics["prec"],
recall=metrics["rec"],
fpr=metrics["fpr"],
thres=metrics["thres"],
).show()
# Display table too
display(
pd.DataFrame(
{
"thresholds": metrics["thres"],
"precision": metrics["prec"],
"recall": metrics["rec"],
"fpr": np.round(metrics["fpr"], 3),
"TP": metrics["tp"],
"TN": metrics["tn"],
"FP": metrics["fp"],
"FN": metrics["fn"],
}
)
.loc[::10]
.loc[::-1]
.round(3)
)
For completness, let's look at the confusion matrix on the test set, with a threshold of 0.5.
confusionMatrix = confusion_matrix(y_test, y_test_pred)
plot_confusion_matrix(
confusionMatrix,
normalize=False,
classes=["No failure (N)", "Failure within 10 days (P)"],
)
From the data point of view, this problem could be improved in two ways:
From the modeling point of view, further model architectures could be explored, especially the ones that take advantage of inter-temporal relationships.
On a practical level, it would be interesting to explore further classification problems. For example: Will the engine fail in less than 10 cycles, in between 10 and 20, or in more than 20? This could be implemented in practice as a red-yellow-green trafficlight.